Features of the Vocalocity Voice Browser

The Vocalocity Voice Browser is a server-based software solution that gives companies the ability to integrate enterprise VoiceXML-based applications into a telephony network. It bridges the enterprise tier with the Public Switched Telephone Network (PSTN) or IP Network to provide a highly-scalable, carrier-class call time platform for execution of IVR and speech applications.

The Vocalocity Voice Browser terminates the calls from the PSTN/IP network, intelligently selects the appropriate resources for call handling (such as ASR and TTS), and communicates with the Web-tier to execute the application.

The Vocalocity Voice Browser features an open, standards-based architecture that incorporates best-of-breed third-party technologies such as automated speech recognition and text-to-speech with a set of software and tools for managing voice applications and infrastructure. The Vocalocity Voice Browser allows companies to move to next-generation standards and technologies while preserving the ability to migrate existing IVR systems. An n-tier, component-based architecture, the Vocalocity Voice Browser is designed with industry standards, such as VoiceXML, SSML, and SRGS, and on leading operating systems, Microsoft Windows and Red Hat Linux.

Logical Architecture

The following illustration shows the logical architecture of the Vocalocity Voice Browser.

There are four major high-level logical components of the Vocalocity Voice Browser: Telephony Interface, ASR, TTS, and Interpreter.

Telephony Interface

The telephony component handles communication with the PSTN or IP network and handles all call control and media related functions, such as answering the call, playing audio prompts, and receiving spoken utterances.

The telephony component interacts with the telephony hardware through a telephony extension point, or TEP.

Automated Speech Recognition (ASR)

The speech recognition component is responsible for managing the associated application grammars and recognition state, and processing the spoken utterances, attempting to recognize the spoken utterances to a set of known valid inputs, which drive the flow and logic of the application.

Vocalocity provides integrations to leading ASR vendors, including SpeechWorks and LumenVox.

Text-to-Speech (TTS)

The text-to-speech component is responsible for turning textual output into synthesized audio that can be played back to the user as if it was spoken by a human. Text-to-speech is useful when dynamic content does not lend itself to pre-recording.

Vocalocity provides integrations to many leading TTS engines, including SpeechWorks Speechify and RealSpeak, and VoiceWare VoiceText.

VoiceXML Interpreter

The VoiceXML interpreter manages the dialog state and the application context during a given call and manages the communication back to the application server that is delivering the XML content. Depending on the language, execution of the document will follow the dialog programming standard set forth in the appropriate specification.

The Vocalocity VoiceXML Interpreter supports all required elements in VoiceXML 2.0 and VoiceXML 2.1.

Additional Components

In addition to the major components, the Vocalocity Voice Browser includes several lower-level components that model the entire interaction of the system, resources, and components in the Vocalocity Voice Browser.

In most cases, you will not work directly with or modify these components; they are part of the underlying architecture.

Component

Purpose

Call ID Generator

Generates a unique ID for each call. This globally unique identifier is commonly called a GUID.

Call Router

Maps an incoming call to a specific URI, which in turns will deliver the necessary XML for the call.

Set up call routing in Vocalocity Control Center.

Call Manager

Manages a call and maps the appropriate resources – such as ASR, TTS and Interpreter – to the call.

Channel Manager

Manages one or more physical or logical channels. Channels define an endpoint that can receive and (or) originate a call.

Set up channels in Vocalocity Control Center (as part of configuring the instance).

Caching Manager

Caches content used by the Interpreter during a call.

The Caching Manager keeps the most-used and frequented content local, while preserving the dynamic aspects of any voice application and appropriately abiding by the HTTP and VoiceXML specifications in regards to caching rules.

A distributed version of the Caching Manager can be used in larger networks where multiple gateways would benefit from one or more larger caching servers instead of spreading the disk requirements across individual servers.

Expanded View of the Vocalocity Voice Browser

In the logical architecture diagram on Logical Architecture, the Vocalocity Voice Browser combined the telephony and voice (ASR, TTS, and Interpreter) building blocks of a voice application system.

The following illustration expands on the high-level logical architecture diagram. It shows the various layers that make up the Vocalocity Voice Browser (bounded in blue) and how they integrate with the required third-party voice and telephony components.

Note: The application server and application components are not included in this illustration.

The Vocalocity Voice Browser (bounded by a light blue box) includes the following layers:

u Dialog, or interpreter, layer

u Middleware layer – the VocalOS Communication Framework

u Integration layer – APIs for integrating with voice browser components, as well as delivered "managers" that handle common processing aspects

u Extension points – custom adapters (or interfaces) to vendor-provided telephony and speech components

Also included are the vendor-specific ASR and TTS engines, as well as the telephony hardware.

Dialog or Interpreter Layer

The Vocalocity Voice Browser includes an XML interpreter that conforms to all required VoiceXML standards. In addition, an open-source VoiceXML interpreter, called OpenVXI, is available from Vocalocity.

Integration Layer

The integration layer ties the Vocalocity Voice Browser with required telephony, ASR, and TTS components.

Vocalocity provides APIs that enable developers to create extension points that handle messaging between the Vocalocity Voice Browser and those third-party components.

In addition to the Vocalocity APIs, these four components (or managers) handle common processing aspects. These components are internal to the Vocalocity system; it is unlikely that you will need to customize them.

Component

Purpose

Audio Manager

Handles the playing of audio prompts and recording of audio (for example, voice mail messages)

DTMF Handler

Handles the recognition of DTMF digits entered by the caller.

Dialog Manager

Manages the requesting and releasing of dialog sessions that are responsible for all the components necessary for a call. This includes a document interpreter, reco session, TTS session, media session, call session, channel session.

The Dialog Manager is an internal component; although it manages the Interpreters, it is not the same as the Dialog layer in the illustration.

Media Manager

Manages the storage, retrieval, and caching of media – raw data retrieved from a URI addressable location such as a web server or file.

The Media Manager’s primary responsibility is to deliver the media to the requestor.